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Abstract — Previous preliminary results on the application of 
knowledge networks to noise reduction in stationary harmonic 
and weakly chaotic signals are extended to more general cases. 
The formalism gives a novel algorithm from which statistical tests 
for the identification of deterministic behavior in noisy stationary 
time series can be constructed. 

Index Terms — Noise reduction, knowledge networks, signal 
processing, time series analysis. 



I. Introduction 

NOISE reduction and identification of underlying deter- 
ministic behavior in signals are fundamental questions in 
fields like communication [13], [16] and time series analysis 
[7]. A classical model setup relative to the measurement of 
such signals [13], [16], considers that each observation in a 
sequence yi,y2, ■■■,yi, ■■■,yT can be decomposed as a sum of 
a deterministic component and a random perturbation, 



y^^yit^) = fiu) + siu). 



(1) 



The random terms e{ti) are statistically independent from 
measurement to measurement and independent of /. Consider 
a clean signal that can be adequately modeled by a linear 
combination of the form 



L 

E 

1=1 



anp{biti + ci) 



(2) 



where the ip's are members of an orthogonal basis of functions. 
The meaning of adequately modeled in the present context 
refers to the consistency of / with Eq. ([0, in the following 
sense: if / is approximated through the optimization of some 
suitable risk or likelihood function in a finite sample, then the 
residuals should behave like independent random variables. 
Additionally, if the resulting form of / is expected to be 
used in a fruitful way for prediction purposes, then / should 
have the same consistency also outside the original sample, 
satisfying a suitable goodness criterion as well. In general, 
the specific nature of the functional basis for / is hidden. 
For instance, the number of components needed to describe 
the signal, L, is usually unknown beforehand. Previous to any 
attempt of fitting the data to /, the model complexity should 
be defined. For the setup given by Equations Q and (|2ji L 
gives a quantity that measures the model complexity. 
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The estimation of L is closely related to the separation of 
the signal from the noise. In order to see this, consider the case 
in which y{t) is stationary and (y) = 0, where the brackets 
stand for statistical average. The variance of y is in this case 
written as 



L L 



(3) 



The model complexity L could be estimated from the 
knowledge of the noise amplitude and some statistical aspects 
of the components of the basis. 

The purpose of the present contribution is to give a novel 
method for the estimation of the complexity of signal models, 
which in turn introduces a new framework to deal with 
noise reduction. The main concern regarding the application 
of the formalism is on cases in which the noise is strong, 
that is, with a variance comparable with the corresponding 
variance of the clean signal. The proposed approach is valuable 
to the characterization of deterministic signals under strong 
stochasticity. In many important fields of application, like 
analysis of geophysical data, voice recognition, time series of 
economic, ecological or clinic origin, etc., the identification of 
deterministic behavior is difficult due to the presence of strong 
additive noise or insufficient sample size. These difficulties are 
particularily evident for the identification and characterization 
of low dimensional chaotic behavior in noisy time series. The 
algorithm introduced here tackle these questions for several 
important cases. The procedure is linear, yet it is able to 
perform signal analysis tasks that are beyond the capabilities 
of traditional linear noise reduction techniques. 

A. Knowledge Networks 

The proposed method relies on the notion of a knowledge 
network [1], [8]. Knowledge networks have been originally 
motivated from the study of some particular structures that 
arise in economy and biology, like interactions between con- 
sumers and products in a market or protein - substrate inter- 
actions [8], [9]. A knowledge network is defined as a network 
in which the nodes are characterized by L internal degrees 
of freedom, while their edges carry scalar products of vectors 
on two nodes they connect [8]. In order to fix ideas, consider 
the following knowledge network model of opinion formation 
[1], [8]: suppose that there exists a database of opinions given 
by agents on a given set of products. This database can be 
seen as a sparse matrix, with holes corresponding to missing 
opinions (say, agents that have never been exposed to a given 
product). In geometrical words, the preferences of an agent are 
represented as a vector in an hypothetical taste space, whose 
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dimension and base vectors are generally unknown. A product 
is represented by a similar vector of qualities. An agent's 
opinion on a given product is assumed to be proportional 
to the overlap between preferences and qualities, which can 
be expressed by the scalar product between corresponding 
vectors. Therefore, products act like a basis, and opinions as 
agent's coordinates on such a basis. Consider a population of 
M agents interacting with N products. The two sets of vectors 
lie in a i-dimensional space, a„ = (a^, a^, a^) and h„i = 
{b^,b'^, b^), where n = 1,2, ...,N and to = 1, 2, M. In 
this way the overlap ym,n = b„i • a„ represents the opinion of 
agent hm on product a„. Only the overlaps ym.n — i>m ■ a„ 
can be directly observable. The issue is then to reconstruct the 
hidden quantities from a known fraction of the scalar products. 
For the case in which L is known, Maslov and Zhang [8] have 
shown the existence of thresholds for the fraction p of known 
overlaps, above which is possible to reconstruct at different 
extents the missing information. Bagnoli, Berrones and Franci 
[1], have generalized the study of Maslov and Zhang to the 
case in which the dimensionality L is unknown. The present 
work mainly relies on this last approach, so a brief summary of 
the results of Bagnoli, Berrones and Franci is now presented. 

Suposse that the components of bm and an are random 
variables distributed according to 



opinion matrix. The results are extended to sparse datasets 
simply by the redefinition of the parameters AI and N like 
functions of the pair {m,n). In this way il/„ represents the 
available number of opinions over product n given by any 
agent and N„i is the number of opinions expressed by agent 
TO regarding any product [1]. 

B. Knowledge Networks and Signal Models 

As already pointed out in [3], a knowledge network frame- 
work for signals as those described by Eqs. Q and (|2j can 
be built for certain classes of stationary signals. The essential 
point is the assumption that a distribution for the components 
of the signal model exists, analogous to distribution @. If 
N time ordered subsamples of size M are extracted from the 
observed sequence yi,y2, ■■■,yi, yr, we refer to ym,n as the 
measured value at time m in subsample n, with n = 1, 2, iV 
and 771 = 1,2,...,M. The distribution of the components of 
ym,n is assumed to be 

)^Pnj{a)Pm.nAv)- (8) 

In order to see how a distribution P{an,i, 'Pm,n,i) can arise 
for the problem in hands, note that from Equations Q and Q 
follows that 



P{al,bl) = P„,i{a)PrnAb), 



(4) 



and define (h) as the average, computed in the thermodynamic 
limit, over P{al^,bl^J of an arbitrary function /i(a5j, For 
a set of hidden components distributed according to Eq. 0, 
the y's are uncorrelated in the thermodynamic limit. However, 
correlations arise because L is finite. 

In order to kept the expressions simple, it is assumed that 
("^ri) ^ (Hi) ~ 0- Averaging over the distribution the 
variance of the overlaps is written as 



{y^)^L{a^){b^). 



(5) 



For this model setup, Bagnoli, Berrones and Franci [1] 
have shown that any overlap can be expressed in terms of 
a weighted average of other overlaps. 



L 



M 



M 



1 ^ 



Cm,iyi,n + £L,M,N, i 171, 



(6) 



where d.j is the correlation among yi and yj, specifically, the 
correlation calculated over the expressed opinions of agents i 
and j on different products. This correlation asymptotically 
goes to the overlap between the corresponding vectors of 
agents tastes. The hidden quantity L can be extracted by fitting 
the proportionality factor jj^i ■ 

The error term e is at first order given by 



Vr, 



1 = 1 



a„j(p{mbn,i + Cn,i) 



(9) 



For fixed L, the parameters an,i, bnj and Cn,i are chosen 
to be optimal in the given sample with respect to some 
suitable risk or likelihood function [4]. Due to the noise and 
to the finite sample size, the chosen parameters fluctuate from 
sample to sample, giving rise to a distribution of the form 

P{anA,Vm,7iA)- 

In the next Section a formalism for noise reduction in 
signals is built under the assumption (|8}. The close connection 
between the problem of noise reduction and estimation of 
model complexity is shown, leading to a new technique for 
model complexity estimation in stationary signals. In Section 
irm the resulting algorithm is numerically tested on several 
examples, that are relevant to important potential applications. 
Final remarks and a brief discussion of future work is given 
in Section Hvl 

II. Noise Reduction by Knowledge Networks 

Consider the following Unear transformation of the compo- 
nents 



and ^ o,n,i — [aij 



(10) 



where 



N 



An aspect of this formaUsm that is important for applica- 
tions is that there is no necessity to have a fully connected 



n 



(11) 
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Introducing the definitions 



A 



and 



/ ai l ■■■ otv,! \ 



(12) 



Y 



k YV ^ 



Y 



Y 



M 

k [^^A + r] [A^^ + r^] 

M7V[L(a2)((^2^+(e2)] 

k [^^AA^^^^A + TT^^'^A] 
M N[L{a^){^^) + {e^)] ' 
Introducing the results (I15> and (I16> into Eq. ( I18> 



(18) 



\ £m,i 



(13) 



the model setup given by Eq. (|9} can be written in matricial 
form as 



Y = ^lA + r. 

In the limit N ^ oo the operation AA'^ goes to 



(14) 



(al) 



AA^ =N 



\ . 
In the same way, in the limit M 



( {^D 





V 











\ 





(4) I 



\ 





(15) 



(16) 



The form of the diagonal elments in Ec. M6\ follows from 
an additional ergodicity assumption: the average {^pj) can be 
equivalently taken over infinitely many finite samples or over 
a single sample of infinite length. For stationary signals the 
validity of this assumption is straightforward. 

Consider the operation 



Y = —CY, 



(17) 



where C is the correlation matrix of the ?/'s. It is now shown 
that if i^af) = (a^) and (^ipf) = (t^f^), that is, if the vaiiabilty 
due to finite sample size, discrete sampling and noise affect in 
the same way all of the components, then y = F — F in the 
limit N oo, 1\I oo, using a suitable value for the factor 
k. The formula Mil is expanded as 



Y = 



<^^A. 



The factor k must therefore be chosen as 



k ^ 



M[L{a^){^^) + {e')] 



(19) 



(20) 



M (a2) (<^2) + (£2) 

The fluctuations of the observable y{t) can be decomposed 



as 



{y')^L{a'){^') + {s'). (21) 

Introducing Eq. J21I I into Eq. j20l l. an expression for L in 
terms of measurable quantities is found 



L = 



(22) 



where a — In order to see how the terms appearing at 
the right in Eq. i22\ are measured, consider the following 
algorithm for noise reduction and estimation of the optimum 
complexity in models for stationary signals. The anticipation 
formula in this case reads 



k 



M 

h=l,h=£i 



(23) 



The signal is processed performing the following steps: 

i) Calculate the autocorrelation function C{t). 

ii) Perform mean squares over a sample of M consecutive 
points to estimate the factor a = \n Eq. (I23> . 

The mean squares problem can be solved exactly, giving 



Y^Uy{u)T.f=iC{tr)y{u 



tr) 



■i 7^ T,j ^ Ti,j ^ T2, 



(24) 



with M less than or equal to one half of the total lenght 
of the signal. The term (^2^ is estimated after the filtering, 
using the filtered data as an approximation of the underlying 
deterministic signal and performing the substraction (^2^ = 

{y') - in _ 

iii) By the use of Eq. (I22> . calculate L in terms of observable 
quantities. 

The steps i) - iii) define what hereafter is called the 
Knowledge Network Noise Reduction (KNNR) algorithm. 
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III. Examples 

The KNNR algorithm is tested on data generated numeri- 
cally, adding at each time step a Gaussian white noise term 
e{t) to a deterministic function f{t). The simulation of the 
noise is based on the L'Ecuyer algorithm, which is known 
to accomplish adecuate performance with respect to the main 
statistical tests, and to produce sequences of random numbers 
with lenght - 10^® [11]. The noisy data y{t) = f{t) + e{t) 
enters as input for the KNNR algorithm. By the use of the 
Fast Fourier Transform of the input [11], the autocorrelation 
function is calculated for a maximum lag equal to one half 
of the total length of the signal. The steps ii) and iii) of the 
KNNR algorithm are then performed over the second half of 
the input. 

The capabilities for noise reduction in harmonic and weakly 
chaotic time series of the proposed method have already been 
discussed in [3]. 

The KNNR framework provides a characterization of the 
signal model complexity in terms of L, the number of member 
functions of a certain orthogonal basis needed to describe the 
signal, if it is indeed separable into a deterministic component 
and a white noise term. If the necessary assumptions are met, 
L should converge to a finite value as the sample size grows. 
This fact can be used to identify underliyng deterministic 
behavior. 

In the next examples the KNNR approach is tested on 
several chaotic systems, with and without additive noise, and 
for camparision purposes, on purely stochastic systems as 
well. The mean value of the signals is substracted before they 
enter as input in the KNNR algorithm. The examples with a 
deterministic part are therefore constructed by 

yi = Si+Si- (y) , (25) 

where yi is the input and Si is given by the iteration of 
a nonlinear discrete map. Each of the noise terms e^, is 
independently drawn from a Gaussian distribution. 

The KNNR algorithm is capable to perform tasks that 
are beyond the scope of traditional linear signal processing 
techniques. For instance, with large enough sample size, the 
KNNR algorithm is able to identify nonlinear behavior in 
signals whose power spectrum is consistent with a correlated 
stochastic process. This identification is not possible by clas- 
sical approaches like the Wiener filter [16], which relies in a 
clear separation between oscillatory and noise components in 
the spectrum. More recent methods, like surrogate data [15] 
or nonlinear techiques [2], on the other hand, do not give a 
comprehensive framework to deal with noise reduction and 
identification of determinism in a common ground. 

A. The Logistic Map 

An archetypal example of a simple nonlinear system capable 
of chaotic behavior is given by the logistic map [10] 

Si = rsi^i{\- Si^i). (26) 



With a parameter value of r = 3.6 and initial conditions in 
the interval (0, 1), the map \26\ displays a weakly chaotic be- 
havior, close to quasi - periodic motion. As already discussed 
in [3], in this case L ^ 2, indicating that with this low model 
complexity is possible to accomplish the separation dictated 
by Eqs. Q and (|2ji. 

A case with r = 3.7 and the initial condition in the interval 
(0, 1) is analyzed with the KNNR algorithm. The map is 
perturbed by a Gaussian white noise with a variance of 0.2, 
essentially the same variance of the clean signal. 

The power spectrum taken from a sample of 16384 points 
of the input signal is presented in Fig.^ Besides the presence 
of some relevant peaks at high frequencies, the spectrum is 
basically a white noise. 




10"" 10"" 10° 



Fig. 1. Log-log plot of the power spectrum of the perturbed logistic map. 

Segments of the noisy, clean and filtered time series are 
shown in Fig. |2 In order to present all the data in the same 
graph, suitable constants have been added to the mean values 
of the signals. 




9700 9720 9740 9760 9780 9800 



Fig. 2. Noise reduction by the KNNR algorithm for a strongly perturbed 
logistic map. 

In Fig. the values of L for increasing sample size are 
plotted. A mean squares fit of the resulting data is performed 
with respect to the formula 

LM = L-aM-^, a>0. (27) 

The type of convergence given in Eq. (|^} is suggested by 
the first order error term Eq. Q, in the anticipation formula of 
the original Bagnoli, Berrones and Franci setup. This behavior 
of errors is obtained for the case in which the basis components 
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Fig. 3. Convergence of L for the perturbed logistic map. 



are independent random variables. The fundamental point in 
the derivation of Eq. is that the fluctuations of these 
components sum in accordance to the Central Limit Theorem 
[1]. The numerical results suggest that for strongly chaotic 
systems this condition holds. In this example the number of 
hidden components converge to a value of order L ^ 10^. 

The convergence of L constitute a basis for a novel tech- 
nique of identification of chaos and other types of deterministic 
behavior in time series. In real world problems, the availability 
of arbitrarily large samples is a rare luxury. The convergence 
of L can be however assesed indirectly, through the parameter 
a that appears in formula J22> . As the sample size grows, 
the variance terms of Eq. M2\ tend to be constant. In order 
to have a finite value for L, a must be decreasing with M. 
Of course, assimptotically a cx M^^. The particular way in 
which this assimptotic behavior is attained is unknown. By 
a smootness assumption a decreasing behavior of a can be 
however expected for a range of sample sizes. Note that this 
claim is consistent with the curve shown in Fig.|3l On the other 
hand, according to the evidence presented in Subsection llll-DI 
L diverges assimptotically in a linear way with sample size 
for linear stochastic processes with finite correlation lengths. 
On these grounds, the proposed test for determinism is a 
standard F-test applied to log[a{My\. Consider the model 
log{a) = —(3log{M) + c, where /3 is a positive number and c 
is a real. These parameters are given by fitting the linear model 
to data. The null hypothesis is that j3 — Q. Numerical results 
indicate that the proposed test gives a reliable identification 
with input signals of moderate length. In this and all of the 
following examples the _F-test is performed over a set of values 
of the parameter a calculated through the KNNR algorithm 
for noisy signals with sample sizes of 64, 128, 256, 512, 1024, 
2048, 4096, 8192 and 16384. 

In Fig.|4]is presented log{a) vs log{M), where log stands 
for the natural logarithm. It turns out that = 42 >> 
^0.05 (Ij 7) = 5.59, so the null hypothesis is clearly rejected 
at a 95% confidence level. 



B. The Henon Map 

A famous two dimensional extension of the Logistic Map 
is the system introduced by Henon [5], 
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Fig. 4. Beliavior of a with increasing M for the noisy logistic map. 



= a - s^_i + (28) 

The canonical values a — 1.4, h = 0.3 are taken. The 
iteration of the map ( I28> is done starting from the initial 
conditions sg = 0.5, = 0.5. The KNNR algorithm is 
applied to a case in presence of a noise with variance 1.2 
(the variance of the clean signal is 1). The knowledge network 
algorithm performs a satisfactory noise reduction of the input 
signal. Fig. |5] shows the power spectrum of the clean, noisy 
and filtered signals in semilog scale. The input has a length 
of 16384 points. The filtered signal captures the overall shape 
of the clean spectrum. 

For the noisy Henon system the i^-test again indicates 
convergence in L at a 95% confidence level. It is found that 
F-7.2>Fo.o5(l,7) = 5.59. 

C. The Intermittency Map 

In this example the deterministic part of the input is 
generated by the iteration of the intermittency map. 

Si = /3 + s,_i +cs™i, 0<s,_i<d (29) 



l-(3-d 
c = ; • 

The map ( I29I I is related to several models that arise in the 
study of the phenomenon of intermittency found in turbulent 
fluids [14]. Recently, the map j29> has been proposed as 
a model for the long term dynamics of packet traffic in 
telecommunication networks [6]. 

Depending on the parameters, the system ( I29> can display 
spectral properties that range from white noise to 1// noise. 

The values for the parameteres m and d considered here are 
m = 2, d = 0.7. The initial condition is taken as 0.01. Two 
different cases are studied: 

i) P = 0.05. 

With this choice of the parameters the map generates a 
signal with rapidly decaying correlations. The short - term 
correlations are reflected in the fact that the spectrum is a 
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(a) 
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(b) 
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(c) 




Fig. 5. Semilog plots of the power spectra of a signal generated by the 
Henon map: (a) noisy case, (b) clean signal, (c) filtered signal. 



white noise for frequencies smaller than ~ 0.1, as shown 
in Fig. |6^. The same chaotic system in the presence of 
additive noise is considered in Figures |6j5 and|6j;. The noise 
values are independently drawn from a Gaussian distribution 
with standard deviation of 0.4 (the standard deviation of the 
clean signal is 0.26). The perturbed chaotic signal enters as 
input in the KNNR algorithm. In Fig. is shown how the 
KNNR algorithm is capable to reduce considerabily the noise. 
Morover, the filtered signal has similar spectral properties that 
the clean signal, despite the fact that the noisy data displays 
an almost flat spectrum at all frequencies. 

The behavior of a calculated from samples of the noisy 
signal with increasing sample size is shown in Fig. |8] The 

i^-test gives F = 12.8 > ^0.05(1, 7) = 5.59. 



(a) 
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f 



(b) 




f 



Fig. 6. Log-log plots of the power spectra of a signal generated by the 
intermittency map (/3 = 0.05): (a) noisy case, (b) clean signal, (c) filtered 
signal. 



ii) /3 = 0.0005 

In this case the correlations decay much more slowly. The 
mean squares fit of the power spectrum of the clean signal 
to a power law indicates P(/) oc f~^'^^, with a crossover to 
white noise at frequencies ~ 0.001. 

Noise reduction is performed to this map in the presence of 
independent Gaussian perturbations, taken from a distribution 
with standard deviation of 0.5, a value that almost doubles 
the standard deviation of the clean signal, which is 0.26. The 
Fig. |5] makes clear how the KNNR algorithm is in this case 
capable to extract the essentially correct spectral properties 
from a very noisy input signal. While the noisy signal has a 
power spectrum described by P{f) cx /^°'^, which is close 
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Fig. 7. Noise reduction for a strongly perturbed intermittency map (f3 = 
0.05). 
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Fig. 8. Behavior of a with increasing M for the noisy intermittency map 
(/3 = 0.05). 



to the spectrum of a white noise, the fitting of the spectram 
of the filtred signal to a power law indicates P{f) oc /~^'^^. 

The application of the F-test to succesive values of a 
calculated by the KNNR algorithm with the noisy signal as 
input, gives evidence of the convergence to a finite L. It is 
found F = 13> Fo.o5(l, 7) = 5.59. 

D. White Noise and Ornstein-Uhlenbeck Processes 

In contrast to deterministic systems, even in the case that 
these were chaotic, stochastic systems do not display a conver- 
gence in L with increasing observation time. The numerical 
experiments indicate that for signals generated by stochastic 
processes with a finite correlation length, L asimpto tic ally 
grows linearly with sample size. 

The KNNR algorithm is applied to signals generated by 
discrete analogous of the white noise and Ornstein-Uhlenbeck 
processes: sequences of independent random numbers and the 
AR(1) process, respectively. 

A sequence of independent Gaussian deviates is generated 
by the already mentioned L'Ecuyer algorithm. In Fig. 1101 is 
presented the behavior of model complexity for a signal in 
which the random numbers are drawn from a distribution with 
standard deviation of 0.23. The number L diverge linearly. 
Performing an F-test in the same way as before (Fig. II 1> 
gives F = 0.25 << Fo.o5(l,7) = 5.59, which indicates that 
the hypothesis of a constant a can't be rejected at the 95% 
confidence level. 



(a) 
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(b) 




Fig. 9. Log-log plots of the power spectra of a signal generated by the 
intermittency map (/3 = 0.0005): (a) noisy case, (b) clean signal, (c) filtered 
signal. The clean and filtered signals display very similar specfi'al properties, 
while the noisy signal is close to a white noise. 

In Fig. the values of L for increasing sample size are 
plotted for three different reahzations of an AR( 1 ) process of 
the form 

yt = yt-i - Xyt + st, < A < 1. (30) 

The term et is again a Gaussian deviate generated by the 
L'Ecuyer algorithm with a standard deviation of 0.23. In the 
limit in which the parameter A goes to zero the process 
( 13 0> tends to a random walk. For other values of A the 
correlations decay exponentially, with a characteristic time 
1/A. The examples considered in Fig. have the parameter 
values A = 0.5, A = 0.05 and A = 0.005. Note that L 
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Fig. 11. Behavior of a with increasing M for a sequence of independent 
random numbers. 



eventually diverges linearly for all cases. The point at which 
this divergent regime is attained depends on the correlation 
length. In fact, the F-test performed for a maximum sample 
size of 16384 rejects the null hypothesis at a 95% confidence 
level only in the first two cases. This impUes that for small 
enough samples, linear autocorrelated stochastic processes are 
indistinguishable by the KNNR algorithm from chaotic sistems 
with similar autocorrelations. This is quite natural taking into 
account that the KNNR algorithm is based on the linear 
autocorrelation structure. It must be pointed out however, that 
if the stochastic signal at hands has finite correlation length, 
the numerical experiments suggest that the identification of 
determinism is always possible with large enough sample 
sizes. 

It's worth mention that in all of the examples in this 
Subsection, the filtered signals display the same correlation 
lengths than the original signals. 

IV. Conclusion 

The proposed formalism constitute a basis for a novel 
technique of identification of deterministic behavior in time 
series. A careful study of the convergence of L as the sample 
size grows, may be used to improve the introduced statistical 
test. The question of the definition of the most adequate 
statistic and test to be used, e. g. parametric or non-parametric, 
deserves further research. In the same direction, statistical tests 
could also be made on the basis of a comparision between the 



spectral properties of a signal before and after its filtering by 
the KNNR algorithm. 

The presented results, on the other hand, give a linear 
filter for noise reduction capable to extract features otherwise 
difficult to deduce from traditional linear approches. Further 
research should be done on the use of the KNNR algorithm 
for noise reduction and forecasting in important fields of 
application. 

The presented approach treats the time series in a very 
direct manner. The generalization of the KNNR algorithm 
to the case in which y depends on more than one variable 
could be used to allow delay representations of data. This 
may give a more powerful algorithm, capable to identify 
deterministic behavior in smaller data sets, and to connect the 
presented theory with the important problem of the calculation 
of embedding dimensions. This generalization of the study to 
higher dimensional data sets could also find application in 
questions such Uke the estimation of the optimal number of 
hidden neurons in models of artificial neural networks. 
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